skextremes.models.engineering

This module contains algorithms found in the literature and used extensively in some fields.

The following paragraphs have been adapted from Makonnen, 2006

The return period of an event of a specific large magnitude is of fundamental interest. All evaluations of the risks of extreme events require methods to statistically estimate their return periods from the measured data. Such methods are widely used in building codes and regulations concerning the design of structures and community planning, as examples. Furthermore, it is crucial for the safety and economically optimized engineering of future communities to be able to estimate the changes in the frequency of various natural hazards with climatic change, and analyzing trends in the weather extremes.

The return period \(R\) (in years) of an event is related to the probability \(P\) of not exceeding this event in one year by

\[R=\frac{1}{1 - P}\]

A standard method to estimate \(R\) from measured data is the following. One first ranks the data, typically annual extremes or values over a threshold, in increasing order of magnitude from the smallest \(m = 1\) to the largest \(m = N\) and associates a cumulative probability \(P\) to each of the mth smallest values. Second, one fits a line to the ranked values by some fitting procedure. Third, one interpolates or extrapolates from the graph so that the return period of the extreme value of interest is estimated.

Basically, this extreme value analysis method, introduced by Hazen (1914), can be applied directly by using arithmetic paper. However, interpolation and extrapolation can be made more easily when the points fall on a straight line, which is rarely the case in an order-ranked plot of a physical variable on arithmetic paper. Therefore, almost invariably, the analysis is made by modifying the scale of the probability \(P\), and sometimes also that of the random variable \(x\), in such a way that the plot against \(x\) of the anticipated cumulative distribution function \(P = F(x)\) of the variable appears as a straight line. Typically, the Gumbel probability paper (Gumbel 1958) is used because in many cases the distribution of the extremes, each selected from r events, asymptotically approaches the Gumbel distribution when \(r\) goes to infinity.

class skextremes.models.engineering.Harris1996(data=None, ppp='Harris1996', **kwargs)[source]

Calculate extreme values based on yearly maxima using Harris1996 plotting positions and a least square fit.

This methodology differ from others in the module in the location of the probability plotting position.

Parameters

data : array_like
Extreme values dataset.
preconditioning : int or float
You can choose to apply an exponent to the extreme data values before performing the Gumbel curve fit. Preconditioning can often improve the convergence of the curve fit and therefore improve the estimate T-year extreme wind speed. Default value is 1.

Attributes

results : dict
A dictionary containing different parameters of the fit.
c : float
Value of the ‘shape’ parameter. In the case of the Gumbel distribution this value is always 0.
loc : float
Value of the ‘localization’ parameter.
scale : float
Value os the ‘scale’ parameter.
distr : frozen scipy.stats.gumbel_r distribution
Frozen distribution of type scipy.stats.gumbel_r with c, loc and scale parameters equal to self.c, self.loc and self.scale, respectively.

Methods

Methods to calculate the fit:

_ppp_harris1996

Methods to plot results:

self.plot_summary()
_ppp_harris1996()[source]

Review of the traditional Gumbel extreme value method for analysing yearly maximum windspeeds or similar data, with a view to improving the process. An improved set of plotting positions based on the mean values of the order statistics are derived, together with a means of obtaining the standard deviation of each position. This enables a fitting procedure using weighted least squares to be adopted, which gives results similar to the traditional Lieblein BLUE process, but with the advantages that it does not require tabulated coefficients, is available for any number of data up to at least 50, and provides a quantitative measure of goodness of fit.

References

Harris RI, (1996), ‘Gumbel re-visited – a new look at extreme value statistics applied to wind speeds’, Journal of Wind Engineering and Industrial Aerodynamics, 59, 1-22.
plot_summary()

Summary plot including PP plot, QQ plot, empirical and fitted pdf and return values and periods.

Returns

4-panel plot including PP, QQ, pdf and return level plots

class skextremes.models.engineering.Lieblein(data=None, ppp='Lieblein', **kwargs)[source]

Calculate extreme values based on yearly maxima using Lieblein plotting positions and a least square fit.

This methodology differ from others in the module in the location of the probability plotting position.

Parameters

data : array_like
Extreme values dataset.
preconditioning : int or float
You can choose to apply an exponent to the extreme data values before performing the Gumbel curve fit. Preconditioning can often improve the convergence of the curve fit and therefore improve the estimate T-year extreme wind speed. Default value is 1.

Attributes

results : dict
A dictionary containing different parameters of the fit.
c : float
Value of the ‘shape’ parameter. In the case of the Gumbel distribution this value is always 0.
loc : float
Value of the ‘localization’ parameter.
scale : float
Value os the ‘scale’ parameter.
distr : frozen scipy.stats.gumbel_r distribution
Frozen distribution of type scipy.stats.gumbel_r with c, loc and scale parameters equal to self.c, self.loc and self.scale, respectively.

Methods

Methods to calculate the fit:

_ppp_lieblein

Methods to plot results:

self.plot_summary()
_ppp_lieblein()[source]

Lieblein-BLUE (Best Linear Unbiased Estimator) to obtain extreme values using a Type I (Gumbel) extreme value distribution.

It approaches the calculation of extremes using a very classical methodology provided by Julius Lieblein. It exists just to check how several consultants made the calculation of wind speed extremes in the wind energy industry.

It calculates extremes using an adjustment of Gumbel distribution using least squares fit and considering several probability-plotting positions used in the wild.

References

Lieblein J, (1974), ‘Efficient methods of Extreme-Value Methodology’, NBSIR 74-602, National Bureau of Standards, U.S. Department of Commerce.
plot_summary()

Summary plot including PP plot, QQ plot, empirical and fitted pdf and return values and periods.

Returns

4-panel plot including PP, QQ, pdf and return level plots

class skextremes.models.engineering.PPPLiterature(data=None, ppp='Weibull', **kwargs)[source]

Calculate extreme values based on yearly maxima using several plotting positions and a least square fit.

This methodology differ from others in the module in the location of the probability plotting position.

Parameters

data : array_like
Extreme values dataset.
preconditioning : int or float
You can choose to apply an exponent to the extreme data values before performing the Gumbel curve fit. Preconditioning can often improve the convergence of the curve fit and therefore improve the estimate T-year extreme wind speed. Default value is 1.

Attributes

results : dict
A dictionary containing different parameters of the fit.
c : float
Value of the ‘shape’ parameter. In the case of the Gumbel distribution this value is always 0.
loc : float
Value of the ‘localization’ parameter.
scale : float
Value os the ‘scale’ parameter.
distr : frozen scipy.stats.gumbel_r distribution
Frozen distribution of type scipy.stats.gumbel_r with c, loc and scale parameters equal to self.c, self.loc and self.scale, respectively.

Methods

Methods to calculate the fit:

_ppp_adamowski

_ppp_beard

_ppp_blom

_ppp_gringorten

_ppp_hazen

_ppp_hirsch

_ppp_iec56

_ppp_landwehr

_ppp_laplace

_ppp_mm

_ppp_tukey

_ppp_weibull

Methods to plot results:

self.plot_summary()
_ppp_adamowski()[source]

Perform the calculations using the Adamowski method available for the probability positions.

Probability positions are defined as:

\[P = \frac{(N + 1) - 0.25}{N + 0.5}\]

References

De, M., 2000. A new unbiased plotting position formula for gumbel distribution. Stochastic Envir. Res. Risk Asses., 14: 1-7.
_ppp_beard()[source]

Perform the calculations using the Beard method available for the probability positions.

Probability positions are defined as:

\[P = \frac{(N + 1) - 0.31}{N + 0.38}\]

References

De, M., 2000. A new unbiased plotting position formula for gumbel distribution. Stochastic Envir. Res. Risk Asses., 14: 1-7.
_ppp_blom()[source]

Perform the calculations using the Blom method available for the probability positions.

Probability positions are defined as:

\[P = \frac{(N + 1) - 0.375}{N + 0.25}\]

References

De, M., 2000. A new unbiased plotting position formula for gumbel distribution. Stochastic Envir. Res. Risk Asses., 14: 1-7.
_ppp_gringorten()[source]

Perform the calculations using the Gringorten method available for the probability positions.

Probability positions are defined as:

\[P = \frac{(N + 1) - 0.44}{N + 0.12}\]

References

Adeboye, O.B. and M.O. Alatise, 2007. Performance of probability distributions and plotting positions in estimating the flood of River Osun at Apoje Sub-basin, Nigeria. Agric. Eng. Int.: CIGR J., Vol. 9.
_ppp_hazen()[source]

Perform the calculations using the Hazen method available for the probability positions.

Probability positions are defined as:

\[P = \frac{(N + 1) - 0.5}{N}\]

References

Adeboye, O.B. and M.O. Alatise, 2007. Performance of probability distributions and plotting positions in estimating the flood of River Osun at Apoje Sub-basin, Nigeria. Agric. Eng. Int.: CIGR J., Vol. 9.
_ppp_hirsch()[source]

Perform the calculations using the Hirsch method available for the probability positions.

Probability positions are defined as:

\[P = \frac{(N + 1) + 0.5}{N + 1}\]

References

Jay, R.L., O. Kalman and M. Jenkins, 1998. Integrated planning and management for Urban water supplies considering multi uncertainties. Technical Report, Department of Civil and Environmental Engineering, Universities of California.
_ppp_iec56()[source]

Perform the calculations using the IEC56 method available for the probability positions.

Probability positions are defined as:

\[P = \frac{(N + 1) - 0.5}{N + 0.25}\]

References

Forthegill, J.C., 1990. Estimating the cumulative probability of failure data points to be plotted on weibull and other probability paper. Electr. Insulation Transact., 25: 489-492.
_ppp_landwehr()[source]

Perform the calculations using the Landwehr method available for the probability positions.

Probability positions are defined as:

\[P = \frac{(N + 1) - 0.35}{N}\]

References

Makkonen, L., 2008. Problem in the extreme value analysis. Structural Safety, 30: 405-419.
_ppp_laplace()[source]

Perform the calculations using the Laplace method available for the probability positions.

Probability positions are defined as:

\[P = \frac{(N + 1) + 1}{N + 2}\]

References

Jay, R.L., O. Kalman and M. Jenkins, 1998. Integrated planning and management for Urban water supplies considering multi uncertainties. Technical Report, Department of Civil and Environmental Engineering, Universities of California.
_ppp_mm()[source]

Perform the calculations using the McClung and Mears method available for the probability positions.

Probability positions are defined as:

\[P = \frac{(N + 1) - 0.4}{N}\]

References

Makkonen, L., 2008. Problem in the extreme value analysis. Structural Safety, 30: 405-419.
_ppp_tukey()[source]

Perform the calculations using the Tukey method available for the probability positions.

Probability positions are defined as:

\[P = \frac{(N + 1) - 1/3}{N + 1/3}\]

References

Makkonen, L., 2008. Problem in the extreme value analysis. Structural Safety, 30: 405-419.
_ppp_weibull()[source]

Perform the calculations using the Weibull method available for the probability positions.

Probability positions are defined as:

\[P = \frac{(N + 1) + 1}{N + 1}\]

References

Hynman, R.J. and Y. Fan, 1996. Sample quantiles in statistical packages. Am. Stat., 50: 361-365.
plot_summary()

Summary plot including PP plot, QQ plot, empirical and fitted pdf and return values and periods.

Returns

4-panel plot including PP, QQ, pdf and return level plots